Data requirment
MS data
The MS data should be placed in the raw_data folder of the analyzed folder (such as ./data/QTOF_6600_demo/neg/raw_data) as mzXML files. mzXML is a file format commonly used for storing mass spectrometry data, particularly from LC-MS experiments. mzXML format contains information about mass spectra, including the m/z (mass-to-charge ratio) values, ion intensities, retention times associated with the experiment.
Convert raw data to mzXML
Use software ProteoWizard to convert LC-MS data files (.raw or .wiff) into mzXML format.
Click the Browse button to select files you want to process, then click the add button.
Click Output format button to select the file format to be generated.
Within the filter option, select Peak Picking, then set MS level to 1-2, and click the add button.
Click the start button.

Peak table
The peak table should be placed in the specific analyzed folder (such as ./data/QTOF_6600_demo/neg) as a csv file. It records peak information, such as id, mz, rt, rtmin, rtmax, , and mean_inten.
read.csv("../NetID/get started/UserGuideMD/peak_table.csv")[1:3,]
## id mz rt rtmin rtmax n_293T_2 mean_inten
## 1 1531 71.0140 9.49 9.05 9.62 12000 12000
## 2 2426 71.0141 11.66 11.54 11.79 1800 1800
## 3 1664 72.0091 9.80 9.73 9.87 1200 1200
You can use any software to obtain your own peak table, as long as it is converted to the format above. If rtmin and rtmax is not known, use rt ± a certain fixed number (e.g. ± 0.5). Note that all isotopes, adducts, fragments should not be removed in the peak table as they are important information for NetID global optimization.
We provide our workflow for obtaining the peak table in section 5.
MS2 data
The MS2 data files should be placed in the raw_data folder of the analyzed folder (such as ./data/QTOF_6600_demo/neg/raw_data/MS2_Data) as mgf files. A data-dependent acquisition file is used to obtain the mgf file using ProteoWizard.
Click the Output format button and select mgf format.

Reference compound library
The compound library exists in the dependence folder as a rds file. It records the compound information, such as accession, name, SMILES, status, formula, mass, rdbe, etc. The information is extracted from and merged with two compound libraries: HMDB and PubChemLite.
readRDS("../NetID/dependence/hmdb_pubchemlite_merge_result_simple.rds")[1:3, -c(9,10)]
## accession name SMILES formula mass
## 1 HMDB0000001 1-Methylhistidine CN1C=NC(C[C@H](N)C(O)=O)=C1 C7H11N3O2 169.0851
## 2 HMDB0000002 1,3-Diaminopropane NCCCN C3H10N2 74.0844
## 3 HMDB0000005 2-Ketobutyric acid CCC(=O)C(O)=O C4H6O3 102.0317
## rdbe PubMed_Count status FirstBlock category
## 1 4 94 quantified BRMWTNUJHUMWMS Metabolite
## 2 0 575 quantified XFNJVJPLKCPIBV Metabolite
## 3 2 328 quantified TYEYBOSBBBHJIV Metabolite
Known library
The known library exists in the dependence folder as a csv file. It records the retention time information of metabolites. We provide our in-house RT table for demo.
readRDS("../NetID/get started/UserGuideMD/RT_library.rds")
## accession name SMILES
## 1 HMDB0000001 1-Methylhistidine CN1C=NC(C[C@H](N)C(O)=O)=C1
## 2 HMDB0000005 2-Ketobutyric acid CCC(=O)C(O)=O
## 3 HMDB0000008 2-Hydroxybutyric acid CC[C@H](O)C(O)=O
## 4 HMDB0000011 3-Hydroxybutyric acid C[C@@H](O)CC(O)=O
## 5 HMDB0000012 Deoxyuridine OC[C@H]1O[C@H](C[C@@H]1O)N1C=CC(=O)NC1=O
## formula mass rdbe status Hilicon
## 1 C7H11N3O2 169.0851 4 quantified 8.658
## 2 C4H6O3 102.0317 2 quantified 4.159
## 3 C4H8O3 104.0473 1 quantified 5.698
## 4 C4H8O3 104.0473 1 quantified 7.336
## 5 C9H12N2O5 228.0746 5 quantified 3.596
MS2 spectral library
The MS2 spectral library exists in the dependence folder as a rds file. It records the reference MS2 spectra. We download MS2 data from HMDB and convert it into the format we defined.
readRDS("../NetID/get started/UserGuideMD/MS2_library.rds")[1]
## [[1]]
## [[1]]$spectrum
## mz intensity formula
## 1 101.0244 1 C4H6O3
##
## [[1]]$notes
## character(0)
##
## [[1]]$formula
## [1] "C4H6O3"
##
## [[1]]$external_id
## [1] "HMDB0000005"
##
## [[1]]$SMILES
## [1] "CCC(=O)C(O)=O"
##
## [[1]]$precursor_mz
## [1] 101.0244
##
## [[1]]$polarity
## [1] "negative"
##
## [[1]]$instrument_type
## [1] "LC-ESI-QTOF (UPLC Q-Tof Premier, Waters)"
##
## [[1]]$collision_energy_level
## character(0)
##
## [[1]]$collision_energy_voltage
## character(0)
##
## [[1]]$adduct
## [1] "M-H"
##
## [[1]]$data_source
## [1] "Experimental spectra from HMDB"
Main steps and successful result
Set NodeSet and EdgeSet
Setting up NodeSet and EdgeSet...
[1] "3453 negative nodes"
Adduct Biotransform Fragment Natural_abundance Radical
6572 8274 661 8617 591
EdgeSet_expand
Setting up LibrarySet and StructureSet...
start EdgeSet_expand...
[1] "oligomer_multicharge 278"
[1] "experiment_MS2_fragment 4849"
[1] "library_MS2_fragment 7"
Candidate formula pool propagation and scoring...
[1] "Step 1 elapsed="
Time difference of 5.939358 secs
[1] "Step 2 elapsed="
Time difference of 15.9693 secs
[1] "Step 0.01 elapsed="
Time difference of 27.72133 secs
[1] "Step 0.02 elapsed="
Time difference of 42.09879 secs
[1] "Step 0.03 elapsed="
Time difference of 1.11554 mins
[1] "Step 1.01 elapsed="
Time difference of 1.398908 mins
[1] "Step 1.02 elapsed="
Time difference of 1.706977 mins
[1] "Step 1.03 elapsed="
Time difference of 2.239343 mins
CplexSet and ilp_nodes and ilp_edge
setting up CplexSet...
[1] "Finish CplexSet initialization"
[1] "Finish CplexSet scoring"
[1] "Complexity is 264379 variables and 368739 constraints."
nc nr CPX_MAX obj rhs sense beg cnt ind val lb ub ctype
1 1 1 264379 368739 368739 264379 264379 836463 836463 264379 264379 264379
mat
6
Run optimization.
In the console, error message should not occur. If optimization step is successful, you will see messages in the following format.
Run optimization...
[1] "Optimization ended successfull - optimal - OBJ_value = 1989.44"
41.87 sec elapsed
path annotation...
Time difference of 7.028952 mins
NetID total run time:Time difference of 12.96994 mins
Output
Three files will be generated in the neg(or pos)/NetID_output folder. Expected run time on a “normal” desktop computer should be within an hour.
* NetID_annotation.csv contains the annotation information for each peak.
* cyto_nodes.csv and cyto_edges.csv record all the information about the nodes and edges that make up the molecular network.
* NetID_output.RData records all variables during operation. This file will be used for network visualization in Shiny R app.
Customized settings
Customized known library
In the dependence folder, open the known_library.csv file. Add or remove metabolites in the known library file as you wish. For RT information, add column to record retention time. Multiple RT lists can be stored by adding additional columns. Empty retention time is allowed for a entry.
In NetID_run_script.R, set LC_method to the column name to input RT information in NetID.
Customized MS2 spectral library
A limited number of MS2 spectra are included in the demo. You can adapt your own MS2 library to the existing format.
Score Setting
See Supplementary Note 2 of NetID paper for explanation